28 research outputs found
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
Humans optional? Automatic large-scale test collections for entity, passage, and entity-passage retrieval
Manually creating test collections is a time-, effort-, and cost-intensive process. This paper describes a fully automatic alternative for deriving large-scale test collections, where no human assessments are needed. The empirical experiments confirm that automatic test collection and manual assessments agree on the best performing systems. The collection includes relevance judgments for both text passages and knowledge base entities. Since test collections with relevance data for both entity and text passages are rare, this approach provides a cost-efficient way for training and evaluating ad hoc passage retrieval, entity retrieval, and entity-aware text retrieval methods
Relevance models to help estimate document and query parameters
A central idea of Language Models is that documents (and perhaps queries) are random variables, generated by data-generating functions that are characterized by document (query) parameters. The key new idea of this paper is to model that a relevance judgment is also generated stochastically, and that its data generating function is also governed by those same document and query parameters. The result of this addition is that any available relevance judgments are easily incorporated as additional evidence about the true document and query model parameters. An additional aspect of this approach is that it also resolves the long-standing problem of document-oriented versus query-oriented probabilities. The general approach can be used with a wide variety of hypothesized distributions for documents, queries, and relevance. We test the approach on Reuters Corpus Volume 1, using one set of possible distributions. Experimental results show that the approach does succeed in incorporating relevance data to improve estimates of both document and query parameters, but on this data and for the specific distributions we hypothesized, performance was no better than two separate one-sided models. We conclude that the model's theoretical contribution is its integration of relevance models, document models, and query models, and that the potential for additional performance improvement over one-sided methods requires refinements
Relevance for browsing, relevance for searching
The concept of relevance has received a great deal of theoretical attention. Separately, the relationship between focused search and browsing has also received extensive theoretical attention. This article aims to integrate these two literatures with a model and an empirical study that relate relevance in focused searching to relevance in browsing. Some factors affect both kinds of relevance in the same direction; others affect them in different ways. In our empirical study, we find that the latter factors dominate, so that there is actually a negative correlation between the probability of a document's relevance to a browsing user and its probability of relevance to a focused searcher
A new unified Probabilistic model
This paper proposes a new unified probabilistic model. Two previous models, Robertson et al.'s "Model 0" and "Model 3," each have strengths and weaknesses. The strength of Model 0 not found in Model 3, is that it does not require relevance data about the particular document or query, and, related to that, its probability estimates are straightforward. The strength of Model 3 not found in Model 0 is that it can utilize feedback information about the particular document and query in question. In this paper we introduce a new unified probabilistic model that combines these strengths: the expression of its probabilities is straightforward, it does not require that data must be available for the particular document or query in question, but it can utilize such specific data if it is available. The model is one way to resolve the difficulty of combining two marginal views in probabilistic retrieval
Relevance data for language models using maximum likelihood
We present a preliminary empirical test of a maximum likelihood approach to using relevance data for training information retrieval (IR) parameters. Similar to language models, our method uses explicitly hypothesized distributions for documents and queries, but we add to this an explicitly hypothesized distribution for relevance judgments. The method unifies document-oriented and query-oriented views. Performance is better than the Rocchio heuristic for document and/or query modification. The maximum likelihood methodology also motivates a heuristic estimate of the MILE optimization. The method can be used to test competing hypotheses regarding the processes of authors' term selection, searchers' term selection, and assessors' relevancy judgments
Web metadata standards: Observations and prescriptions
Various Web content metadata standards and observations on their development efforts were reviewed. Content metadata standards were built on top of infrastructure standards that standardize metadata representation and exchange. Product metadata standards standardize descriptions of physical products, data, information resources, and documents. A metadata standard should define concepts at a metadata model (M1) layer, which offers relatively low language abstraction. It was suggested that users must be educated and practical tools to support their search and navigation through classification and thesaurus hierarchies must be developed to help ensure metadata success
A unified maximum likelihood approach to document retrieval
Empirical work shows significant benefits from using relevance feedback data to improve information retrieval (IR) performance. Still, one fundamental difficulty has limited the ability to fully exploit this valuable data. The problem is that it is not clear whether the relevance feedback data should be used to train the system about what the users really mean, or about what the documents really mean. In this paper, we resolve the question using a maximum likelihood framework. We show how all the available data can be used to simultaneously estimate both documents and queries in proportions that are optimal in a maximum likelihood sense. The resulting algorithm is directly applicable to many approaches to IR, and the unified framework can help explain previously reported results as well as guide the search for new methods that utilize feedback data in IR
Effectiveness of keyword-based display and selection of retrieval results for interactive searches
retrieval results for interactive searche